Characterization of clonal immunoglobulin heavy V-D-J gene rearrangements in Chinese patients with chronic lymphocytic leukemia: Clinical features and molecular profiles

Introduction Several prognostic factors of chronic lymphocytic leukemia (CLL) have been identified, such as cytogenetic aberrations and recurrent gene mutations. B-cell receptor (BCR) signaling plays an important role in the tumorigenesis of CLL, and its clinical significance in predicting prognosis is also under study. Methods Therefore, we assessed the already-known prognostic markers, immunoglobulin heavy chain (IGH) gene usage and the associations among these factors in 71 patients diagnosed with CLL in our center from October 2017 to March 2022. Sequencing of IGH gene rearrangements was performed using Sanger sequencing or IGH-based next-generation sequencing, and the results were further analyzed for distinct IGH/IGHD/IGHJ genes and the mutational status of the clonotypic IGHV (IGH variable) gene. Results In summary, by analyzing the distribution of potential prognostic factors in CLL patients, we displayed a landscape of molecular profiles, confirmed the predictive value of recurrent genetic mutations and chromosome aberrations, and found that IGHJ3 was associated with favorable markers (mutated IGHV, trisomy 12), while IGHJ6 tended to correlate with unfavorable factors (unmutated IGHV, del17p). Discussion These results provided an indication for IGH gene sequencing in predicting the prognosis of CLL.


Introduction
Chronic lymphocytic leukemia (CLL) is the most prevalent adult leukemia in the Western world. The clinical course of CLL is highly heterogeneous, with the majority of cases identified as a relatively indolent type and a minority of cases marked by aggressive progress or malignant transformation despite intensive treatment (1). In recent decades, several predictive markers of CLL prognosis were revealed in succession. Clinical characteristics that may adversely influence the outcome of CLL include age (>65 years old), level of b-microglobulin (>3.5 mg/L) in serum, and Rai stage (I-IV) or Binet stage (B-C). Immunogenetic factors that may be involved in the prediction of prognosis included cytogenetic aberrations (del17p, del11q, del13q, and tri12), gene mutations (mutated TP53, ATM, NOTCH1, etc.), and unmutated status of the IGHV region (2).
Many studies have focused on IGHV, IGHD, and IGHJ gene usage or stereotyped major subsets of BCRs in CLL patients (3)(4)(5). BCR stereotypy refers to highly similar alignment of the complementarity determining region 3 (CDR3) region in the heavy (HCDR3) or light chain (LCDR3), which was found in approximately 41% of CLL patients. A total of 29 major subsets have been identified to date, each accompanied by several satellite subsets. CLL patients with BCR stereotypy showed distinct clinical profiles; for example, the subset #2 subgroup exhibited aggressive behavior independent of the mutational status of the clonotypic IGHV gene, while the IGHV4-34-containing subsets #4 and #16 had a more indolent progression. The prevalence of stereotyped BCR in CLL patients indicated a critical role of BCR signaling in the pathogenesis of this disease. The features of V(D)J rearrangements in Western CLL patients have been well studied and described; however, those in Chinese CLL patients are not clearly and systematically described. The main aim of the research was to describe IGHV, IGHD, and IGHJ genes and their correlation with other confirmed prognostic markers in Chinese CLL patients. This integrative demonstration could help better stratify patients and provide information about the mechanism of BCR signaling in CLL pathogenesis. Strong correlations between IGHV, IGHD, and IGHJ genes in CLL malignant clones and other different risk-stratified or prognostic biomarkers were well established in several large series of CLL patients (6)(7)(8). To verify and clarify these specific correlations in our cohort of Chinese CLL patients, we performed fluorescence in situ hybridization (FISH), karyotype analysis, and targeted next-generation sequencing (NGS) for gene mutations, combined with Sanger sequencing or NGS, to identify clonal B-cell immunoglobulin heavy chain (IGH) gene rearrangements and to assess the extent of somatic hypermutation (SHM) in the IGHV sequence in 71 CLL patients in our center. Then, we analyzed the possible associations and tried to summarize a landscape for subsequent mechanistic studies.

Materials and methods Patients
This study included 71 CLL patients diagnosed by iwCLL criteria with results of V(D)J rearrangements in Tongji Hospital from October 2017 to March 2022 (9). A total of 71 peripheral blood (PB), bone marrow (BM) or lymph node (LN) samples were collected and underwent cytogenetic analysis, including 30 cases examined by karyotyping and 50 cases examined by FISH. Clinicopathologic features, including age (71/71), sex (71/71), and Rai stage or Binet stage (37/71), were retrospectively collected. Samples with incomplete or inconclusive results of V(D)J rearrangements were excluded from this study. All samples were collected after written informed consent was obtained in accordance with the Declaration of Helsinki.
Morphology and immunophenotyping analysis PB smears were stained using a standard Wright-Giemsa protocol and then prepared for manual 100-cell differential white blood cell count. BM aspirate smears were prepared for 200-cell differential white blood cell count, focusing on lymphocytes. The panel of monoclonal antibodies used in immunophenotyping analysis included CD5, CD10, CD11c, CD19, CD20, CD22, CD23, CD25, CD38, CD45, CD79b, FMC-7, CD2, CD3, CD4, CD7, CD8, CD56, CD10, and immunoglobulin k and l light chains (BD Biosciences).CLL scores were calculated based on the system proposed by Matutes et al. (10). Patients with a score of 4-5 were considered typical, while patients with a score of 3 were considered atypical. Patients with a score below 3 were excluded from this study.

Conventional cytogenetics
Tests for cytogenetics were performed in a total of 30 cases. Cells in collected samples were cultured at 1×10 6 cells/mL in the presence of CpG-oligonucleotide DSP30 (2 mmol/L) and IL-2 (0.2 mg/mL) for 72 hours. The banding process was then performed following standard protocols. When possible, at least 20 metaphases were analyzed, and a number of metaphases were marked in the report. The karyotype was analyzed based on the International System for Human Cytogenetic Nomenclature (ISCN 2016). An abnormal clone was identified by structural aberrations, including chromosome gains observed in at least two different metaphases or chromosome losses observed in at least three different metaphases. A complex karyotype was defined as a clone that possesses at least three independent abnormalities at the same time.

FISH
Interphase FISH was performed without sorting on 200 cells from 50 cases from PB, BM or LN samples using a commercial probe panel (MetaSystems, Altlussheim, Germany). Panels included TP53/CEP17 (del17p), ATM (del11q), trisomy 12, D13S29 (del13q), and translocations between IGH and the partner gene (IGH/CCND1). The t(11;14)(q13;q12) IGH/CCND1 dual-color dual-fusion translocation probe was used to identify and rule out cases of mantle cell lymphoma (MCL). Cases with del13q alone or absence of FISH aberrations were categorized as a favorable group according to the Dohner FISH classification (11), while cases with del11q or del17p were considered unfavorable. The group of patients with trisomy 12 was considered to have an intermediate prognosis. The probe cutoff values were set as 5% for the deletion probe, 3% for the trisomy probe, and 1% for the dual-color dual-fusion probe.

Sanger sequencing and IGH-based NGS of the V(D)J rearrangements
Sequencing of IGH gene rearrangements was performed in all 71 cases. Sanger sequencing of IGH gene rearrangements was performed in 58 cases, and NGS of IGH gene rearrangements was performed in the other 13 cases. Sanger sequencing and IGH-based NGS were performed on fresh, frozen or FFPE samples acquired from PB, BM or LNs (Supplementary Table 1). In the Sanger sequencing process, PCR amplification of IGHV-IGHD-IGHJ rearrangements was performed using either genomic DNA (gDNA) or cDNA as described in the InVivoScribe IGH Somatic Hypermutation Assay v2.0 -ABI Fluorescence Detection. Sequence information was analyzed based on direct sequencing of both strands using the IMGT databases and the IMGT/V-QUEST tool (http://www.imgt.org). Only productive rearrangements were further evaluated for parse, reorganization, and output. Information was sorted and reported as Ig gene repertoires, VH CDR3 length, exact amino acid sequence and SHM. In the IGHbased NGS process, the CLL clonotype was established in the tumor sample using locus-specific primer sets for IGHV, IGHD, and IGHJ rearrangements and the MiSeq Illumina platform (LymphoTrack Dx IGHV Leader Somatic Hypermutation Assay-Miseq). The output form of the results was then further analyzed based on the international ImMunoGeneTics (IMGT) information system to identify the exact IGHV-D-J sequence and the corresponding frequency, as previously described (12). In a patient sample, a clonotype present at a frequency of higher than 5% of all rearranged V(D)J sequences was identified as a malignant clone. The clone present at the highest frequency in the tumor sample was named the "calibrating" or "index" clone. The mutational status of the clonotypic IGHV gene was defined as follows: 1) a clone carrying IGHV sequences that exhibited ≥98% homology compared to germline sequences was considered unmutated; 2) a clone carrying IGHV sequences that exhibited <98% homology compared to the germline sequences was considered mutated; and 3) if there were double rearrangements, the mutational status of the IGHV gene in a CLL patient was determined by the suggested double rearrangement SHM interpretation criteria (ERIC) (13). Major-subset stereotyped BCRs were identified by the Arrest/assign Tool (), in which the basic algorithms were established according to the study by Agathangelidis et al. (14).

Targeted NGS
Genetic mutations were detected by targeted NGS panels in 46 cases. Total DNA from PB, BM or LN samples was extracted for targeted NGS to detect gene mutations. Targeted NGS was also performed using the Ion Torrent/Illumina NextSeq 550Dx platform. The panels used are summarized in Supplementary  Table 2. To minimize potential artifacts, only somatic variants with a PASS filter, read depth DP≥500 and VAF≥2% were retained. The filter criteria included 1) nonsynonymous mutations in the exonic region, splicing mutations, and UTR mutations in NOTCH1; 2) allele frequency of the mutated gene < 0.01 in the global population in the 1000G, ExAC, and gnomAD databases; and 3) other gene mutations that were proven to have clinical significance. Particular attention was given to mutated TP53, NOTCH1, MYD88, SF3B1, ATM, and MYC.

Statistical analysis
Statistical analysis was performed in different groups of patients classified by distinct tests. Chi square and Fisher's exact tests were used to assess the associations between categorical variables. One-way ANOVA was used to assess intergroup differences. Statistically, a significant difference was confirmed when P<0.05 in a two-sided test. All analyses were performed using SPSS 23.0.

Baseline characteristics of patients
Seventy-one patients met the inclusion criteria in our study. IGH gene sequencing was performed in all of these patients for the collection and analysis of patterns in V(D)J rearrangements and the extent of SHM. The median age was 61 years, with a range from 36 years to 82 years. Twenty-one females and 50 males were included. The ratio of gender was approximately 1:2, similar to previous reports (15). Among the 71 patients, 37 were newly diagnosed with CLL, in whom the Rai stage and Binet stage were assessed. Of the 37 patients, 9 were diagnosed with CLL Binet A stage, and 28 were diagnosed with Binet B-C stage. Only 2 patients were assessed as Rai stage 0, while the other 35 were assessed as Rai stage I-IV. Molecular features included the mutational status of the clonotypic IGHV gene, FISH aberrations, karyotyping, and immunophenotype analysis. Twenty-one (29.58%) of 71 patients carried an unmutated IGHV region, while 48 (67.61%) of 71 patients possessed a mutated IGHV region. This finding again verified that the frequency of mutated IGHV in Chinese CLL patients (M:U≈2:1) was higher than that in Western CLL patients (M:U≈1:1) (16). Among 50 patients in whom FISH was performed, 6 carried a del17p aberration, 3 carried an 11q deletion, 6 possessed an extra chromosome 12, 22 carried a del13q aberration, and 15 did not have any FISH aberrations. Karyotyping was performed in 33 patients, including 28 with < 3 aberrations and 5 with a complex karyotype. All baseline characteristics of the patients are summarized in Table 1.

IGHV, IGHD, and IGHJ gene usage in CLL patients
The frequency of IGHV, IGHD, and IGHJ gene usage is summarized in Table 2. Similar to previous studies (17), among IGHV genes used in CLL cases, the IGHV3 subgroup was most frequently used, followed by IGHV4 and IGHV1. IGHV4-34, IGHV3-23, IGHV3-11, IGHV1-3, and IGHV4-39 were the most frequently selected genes ( Figure 1A). For IGHD genes, the IGHD3 and IGHD2 subgroups were preferentially selected. The IGHD3-22, IGHD3-10, and IGHD2-15 genes were the most frequently used genes ( Figure 1B). Among the IGHJ genes, the IGHJ4 gene was most frequently used, followed by IGHJ3 and IGHJ5 ( Figure 1C). Next, we assessed the associations between IGHV, IGHD, and IGHJ genes used in CLL patients and the mutational status of the clonotypic IGHV gene by Fisher's exact test. As shown in Table 2, patients using the IGHJ3 gene were all mutated with statistical significance (10/10, P=0.0257<0.05), while the cases using the IGHJ6 gene preferentially possessed an unmutated IGHV region (9/15, P=0.0146<0.05). The significant overrepresentation of IGHJ6 in cases with an unmutated  IGHV region was previously reported (18). IGHJ4 also tended to cooccur with the mutated IGHV region, but the correlation was not significant (25/31, P=0.0817). CLL patients carrying a member of the IGHD3 gene subgroup tended to possess an unmutated IGHV region, but it was not significant. Additionally, clones carrying genes from IGHJ2 seemed to preferentially pair with an unmutated IGHV region (3/4). Next, we investigated the correlations between the IGHV, IGHD, and IGHJ genes used in our cohort and other molecular aberrations using the chi-square test or Fisher's exact test (Table 3). The molecular aberrations involved included the presence of trisomy 12, del(11q), del(13q), del (17p), complex karyotype, mutated TP53 gene, and unmutated IGHV region. No significant difference was found in the analysis involving IGHV gene subgroups. Cases with BCRs rearranged using genes in the IGHV1 subgroup tended to carry a deletion of 11q, but the difference among IGHV groups was not significant (P=0.0592). Cases with BCRs rearranged using genes in the IGHD3 and IGHD5 subgroups carried significantly more del13q aberrations than the IGHD2 group (intergroup: P=0.0292<0.05; IGHD3 and IGHD2: P=0.039<0.05; IGHD5 and IGHD2: P=0.035 <0.05). Cases with BCRs using genes from the IGHD3 or IGHD6 subgroups tended to pair with unmutated IGHV. IGHD6expressing patients also exhibited a higher frequency of complex karyotypes than other IGHD groups. Surprisingly, 3 of the detected aberrations or statuses exhibited statistically significant differences among the IGHJ genes. Patients using genes in the IGHJ3 group exhibited significantly more trisomy 12 (3/6) and paired with unmutated IGHV with a significantly lower frequency compared with IGHJ4 (1/26, P=0.0149 < 0.05) and IGHJ5 (0/10, P=0.0357 <0.05) genes. IGHJ5 and trisomy 12 seemed to be mutually exclusive. IGHJ6-expressing patients possessed significantly (P=0.0329 <0.05) more del17p (3/8) aberrations than the IGHJ4 (1/26) group. However, the specific clinical significance of IGHJ genes themselves used in CLL patients was not clearly demonstrated. The presence of trisomy 12 and a mutated IGHV region were considered favorable prognostic markers in some studies (19,20); thus, we propose that the use of IGHJ3 genes possibly correlates with better outcomes in CLL. Similarly, the use of IGHD3 subgroups was  correlated with del13q alone, which may indicate a better prognosis. Conversely, the use of IGHJ6 genes tended to co-occur with del17p and was possibly associated with TP53 mutation (Table 3), which may indicate an unfavorable outcome. The analysis of associations between clinical features and IGH gene subgroup usage was also performed, but no significant difference was found among the IGHV, IGHD, and IGHJ subgroups (Supplementary Table 3). The intergroup analysis indicated significant differences in sex among IGHV groups and in Rai stage among IGHJ groups; however, when compared in pairs, there was no statistically significant difference.
We also analyzed the correlations between the mutational status of the clonotypic IGHV gene and cytogenetic aberrations (Figure 2). Del11q (4/4, P=0.0142) and del17p (6/6, P=0.0013) were significantly associated with the unmutated status of the IGHV region, which was in accordance with the fact that these factors represented a worse prognosis. Del13q alone was correlated with mutated IGHV status, but not significantly (17/22, P=0.0522), and both markers were considered favorable. Trisomy 12 was equally distributed in the IGHV-mutated and IGHV-unmutated groups. A complex karyotype was also significantly associated with the unmutated IGHV region (5/6, P=0.0184).    The correlation between the length of HCDR3 and the mutational status of the clonotypic IGHV gene was analyzed (n=68) by Student's t test ( Figure 3A). Patients with unmutated IGHV had a significantly longer average length of the HCDR3 region than those with mutated IGHV (19.65vs. 14.25, P<0.0001). Differences in HCDR3 length among IGHV, IGHD, and IGHJ subgroups were also calculated (Figures 3B-D). Significant differences in HCDR3 lengths among IGH subgroups were found in the analysis of IGHD (P=0.0228 < 0.05) and IGHJ (P<0.0001) genes. Patients using genes in the IGHD2 (mean: 17.23), IGHD3 (mean: 16.91) and IGHD6 (mean: 17.78) subgroups had significantly longer average lengths of HCDR3 compared with those using genes in the IGHD1 (mean: 12.29) and IGHD5 (mean: 13.00) subgroups. Similarly, patients using genes in IGHJ2 (mean: 19.75) and IGHJ6 (mean: 20.14) had significantly longer average lengths of HCDR3 than those using genes in IGHJ1 (mean: 16.00), IGHJ3 (mean: 14.22), IGHJ4 (mean: 14.07), and IGHJ5 (mean: 14.67). However, no significant difference was found in the analysis of IGHV subgroups.
CLL clones were also marked by BCR stereotypy. However, filtered by the Arrest/assign Tool, there were only two patients that carried a major-subset stereotyped BCR in our cohort. The two stereotyped BCRs, realigned IGHV4-4/IGHD5-24/IGHJ4 and IGHV4-39/IGHD6-13/IGHJ5, matched major subset #77 and major subset #8, respectively. The ratio between mutated versus unmutated IGHV genes possibly influenced the frequency of stereotyped BCRs, as previously reported (6). However, this ratio in our cohort was similar to that in previous studies. Other factors, such as inheritance, ethnicity, and regional differences, may contribute to the absence or at least the low frequency of stereotyped BCRs in this study.
We next investigated the correlations between the number of mutated genes and other prognostic markers, but no significant correlation was identified in any groups classified based on Correlation of cytogenetic aberrations and mutational status of the clonotypic IGHV gene. The five cytogenetic aberrations detected included FISH aberrations (trisomy 12/Tri12, deletion of 11q/Del11q, deletion of 13q/Del13q, deletion of 17p/Del17p) and the aberrant karyotype (complex karyotype/CK). In the right panel, red columns represent cases with mutated IGHV regions, while pink columns represent cases with unmutated IGHV regions. The statistical analysis was performed using Fisher's exact test, a<0.05 (two-sided). Ns, nonsignificant. *P<0.05. ** P<0.01.  Table 7).

A B D C
The genes with the 4 highest mutation frequencies were particularly considered, including IGLL5, NOTCH1, MYD88, and TP53. The association between these four mutated genes in CLL and molecular prognostic factors (n=22 in the analysis of FISH results, n=22 in the analysis of karyotyping results, and n=31 in the analysis of unmutated IGHV) was analyzed (Table 4). NOTCH1 mutation and TP53 mutation were significantly associated with del17p (P=0.0452 < 0.05; P=0.0008 < 0.05). TP53 mutation was also found to correlate with complex karyotype (P=0.0452 < 0.05) and unmutated IGHV region (P=0.1141, insignificant). Mutated NOTCH1 was significantly related to the unmutated IGHV region (P=0.0095 < 0.05). These results further confirmed the adverse effects of mutated NOTCH1, mutated TP53, an unmutated IGHV region, and a complex karyotype in CLL. Conversely, the IGLL5 mutation   with the highest frequency in our cohort was significantly associated with a mutated IGHV gene (9/9). Moreover, most patients with IGLL5 mutations did not have other known driving aberrations, such as NOTCH1 mutations, SF3B1 mutations, or ATM mutations. A previous study alsodemonstrated the unique features of cases carrying IGLL5 mutations, including prominence in lower-risk mutated CLL, off-target activation-induced cytidine deaminase (AID) activity, and gene mutations previously undescribed (21). IGLL5 was also demonstrated to be frequently mutated in B-cell malignancies, such as Hodgkin's lymphoma (22), diffuse large B-cell lymphoma (DLBCL) (23), multiple myeloma (MM) (24), and CLL with translocations involving IGH (25). However, the exact mechanism of mutated IGLL5 in the tumorigenesis of B-cell malignancies is not yet clear. Our findings may suggest that mutated IGLL5 is correlated with lower-risk CLL pathogenesis.
The associations between the four mutated genes and baseline clinical features were also analyzed, but no significant correlations were found (Supplementary Table 8).

Associations between specific gene mutations and IGH gene subgroups used in CLL
We then analyzed the association between the four gene mutations (IGLL5, NOTCH1, MYD88, and TP53) and the corresponding IGH gene subgroups used in these patients ( Table 5). Patients using IGHJ4 genes tended to carry significantly fewer NOTCH1 mutations than those using other IGHJ genes (intergroup: P=0.0231< 0.05). Patients using IGHV4 subgroups TABLE 5 Correlation of specific mutated genes and usage of IGHV, IGHD subgroups and IGHJ genes.

IGHV Subgroups and Genes
Mutated genes, n (cases with mutated genes/cases with specific IGHV subgroup/gene, %) tended to carry more IGLL5 and MYD88 mutations, but the difference among IGHV subgroups was not significant.

Discussion
During the process of B-cell maturation, the immunoglobulin genes in a B-cell undergo V(D)J recombination to produce a unique BCR for interaction with its specific antigen. The high proportion of stereotyped BCR major subsets (4,5), the significantly biased BCR gene usage compared with the normal repertoire (26), and the remarkable effect of BTKs (27,28) in CLL patients offer support for the theory that BCR signaling is involved in the tumorigenesis and development of CLL.
Several studies have stated explicitly that the subsets classified by IGH genes used in BCRs did differ in clinical courses and molecular features (3,5,29). These inspiring results have led some investigators to propose that these BCR components may help stratify CLL patients by risk level or identify groups with distinct clinical courses. The associations, mutual exclusivity, or differences among groups were analyzed through the collected information about cytogenetic and molecular aberrations and clinical profiles. The veracity of these statistics was validated by the strong correlation between cytogenetic aberrations and mutational status of the clonotypic IGHV gene (del11q and unmutated IGHV, del17p and unmutated IGHV, del13q and mutated IGHV, and complex karyotype and unmutated IGHV). Similarly, unfavorable gene mutations were significantly associated with unfavorable molecular prognostic factors (NOTCH1 mutation and del17p, TP53 mutation and del17p, NOTCH1and unmutated IGHV region). Surprisingly, the most frequent IGLL5 mutation was significantly associated with a mutated clonotypic IGHV gene and seemed to be independent of known driving aberrations in CLL. IGLL5 encodes an immunoglobulin lambda-like polypeptide. The second and third exons in IGLL5 make up the immunoglobulin lambda joining 1 and the immunoglobulin lambda constant 1 gene segment (IGLJ1-C1). The sequencing analysis of IGH gene rearrangements used in our study did not cover immunoglobulin light chain genes. IGLL5 was assumed to act as a possible independent driver mutation in lowerrisk CLL (21). Several studies have supported the adverse effect of mutated IGLL5, such as the pathogenesis of MM (24) and Burkitt lymphoma (30), IGH translocations in CLL (25), and association with R/R DLBCL (23). However, some studies have also demonstrated that the overexpression of IGLL5 affects immunological parameters, generating a more immune-suppressed tumor microenvironment (31). Thus, the conclusion on the IGLL5 mutation is still ambiguous.
The associations between IGH genes used in BCRs and other factors in CLL were calculated. IGHJ3 genes were found to significantly correlate with mutated IGHV regions, while IGHJ6 genes were significantly associated with unmutated IGHV regions. The IGHJ6 gene was also reported to preferentially pair with unmutated IGHV and correlated with a shorter time to first treatment (TTT) in a previous study of CLL together with IGHD3-3 (32). Additionally, the IGHJ3 gene tended to co-occur with trisomy 12, while the IGHJ6 gene cooccurred with del17p more frequently. Based on these facts, we propose that the usage of IGHJ3 genes may help further support the prediction of a better prognosis, while IGHJ6 possibly tends to occur more frequently in cases with a worse prognosis. Patients expressing genes in the IGHD3 subgroup tended to carry more del13q aberrations than those expressing other IGHD genes, also indicating the likelihood of a favorable outcome. However, the use of IGHJ5, which was mutually exclusive with trisomy 12 and NOTCH1 mutation, exerted a controversial effect on prognosis. The influence of HCDR3 length was also analyzed. The unmutated IGHV region was significantly associated with a longer HCDR3 region. IGHD2, IGHD3, and IGHD6 constituted longer HCDR3 regions than IGHD1 and IGHD5. IGHJ2 and IGHJ6 constituted significantly longer HCDR3 regions than IGHJ1, IGHJ3, IGHJ4, and IGHJ5, which again provides evidence for the potential positive effect of IGHJ3 and negative effect of IGHJ6. There is still a lack of studies that have directly focused on IGHJ genes. The usage of IGHJ genes is commonly discussed, accompanied by IGHV genes in stereotyped cases, such as IGHV1-69 in combination with IGHJ6 (32) or IGHJ3 (33). However, in our study, the partner IGHV genes of the IGHJ6, IGHJ3 or IGHJ5 gene varied in IGHV family clans, IGHV subgroups and genes. More experimental evidence is still needed to interpret our results on IGHJ genes since there are only 6 IGHJ genes compared to 45 IGHV genes. Whether the IGHJ gene, as a part of the HCDR3 region, also contributes to the risk stratification and prediction of prognosis in CLL is worthy of further discussion. Largescale CLL studies containing information on IGH sequencing, other prognostic factors and follow-up survival time data will help clarify the exact correlation and corresponding mechanism.
There are still some limitations in our study. The first is the relatively small sample size due to the exclusion of patients who had incomplete or inconclusive IGH sequencing results. The cohort of patients was clinically heterogeneous at diagnosis (Binet A or Binet B-C), which could have influenced some of the results. Although significant associations or differences were found in the analysis of IGH genes and other prognostic factors in our study, more detailed analysis and validation may be accomplished in a study with a larger scale. Second, the statistical results in this study suggest that the IGHJ3 and IGHJ6 genes could have prognostic value; however, only a few studies have discussed the IGHJ genes themselves since these genes are not simply independent variables but are often associated with certain IGHV genes and their corresponding mutational status. One possible theory is that recombination signal sequences (RSSs) may impact the usage frequency of IGHJ by changing the lengths of the IGHJ6 spacer, the nonamer and heptamer of IGHJ3, and the nonamer of IGHJ5 (34). In contrast to IGHD gene usage, for which the utilization frequencies can be increased by deletion polymorphisms in the D locus, IGHJ gene usage was unaffected by polymorphisms of IGHD (35). Despite the inconclusive diagnostic value of IGHJ genes, they could be indirect indicators for prognosis. The low frequency of stereotyped BCR major subsets in our cohort is also intriguing. Studies on the IG repertoire of CLL demonstrated the existence of stereotyped BCR subsets with a frequency from 13.5% to 40% in patients (26); however, only two patients (2.82%) in our study were confirmed to carry a stereotyped BCR from subset #8 and subset #77. Analysis using the Arrest/assign Tool here only involved 19 major subsets that account for approximately 13.5% of stereotyped BCRs. We speculated that the frequency of stereotyped BCR subsets in total CLL cases differed among ethnic groups and regions (36)(37)(38) or that there existed more nonmajor subsets that were not identified by the Arrest/assign Tool. The last limitation is the missing data on clinical outcomes and follow-up visits due to loss to follow-up caused by the long time-scale and relatively indolent progression of most CLL cases. Possible longitudinal studies may help further delineate our conclusions.
In conclusion, by summarizing a series of information on cytogenetic and genetic aberrations and analyzing the frequency and distribution of IGH genes, we displayed an integrative and descriptive landscape of clinical and molecular profiles. Furthermore, the evaluation of the predictive value of IGH V(D)J recombination and the gene usage frequency was completed by analyzing the associations or mutual exclusivity among IGH genes and other clinical or molecular prognostic factors. We validated the favorable or unfavorable effects of the already known factors and further demonstrated the possible interactions of IGHV, IGHD, and IGHJ genes with these indicators and the involvement of IGH gene rearrangements in the initiation or development of CLL.

Data availability statement
The data presented in the study are deposited in the SRA repository, accession number "PRJNA930566". This project we created will be released in September 2023.

Ethics statement
The studies involving human participants were reviewed and approved by Medical Ethics Committee of Tongji Hospital, Tongji Medical College of HUST (Approval number : TJ-IRB20200707). The ethics committee waived the requirement of written informed consent for participation.

Author contributions
XD conceived, drafted the manuscript and drew the figures. MZ collected the statistics and discussed the manuscript. JW revised the manuscript. MX and XZ provided guidance and approved the version to be submitted. All authors contributed to the article and approved the submitted version.

Funding
This work was supported by the National Natural Science Foundation of China (No. 81770211 to MX).